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1 Distribution Overview 


1.1 Discrete Distributions 


PMF 


Bis 


Notation! Fx (x) fx(x) E [X] V [X] Mx(s) 
a ae I(a<a<b) atb (Cae oh et — e (O45 
Unif Unif {a,...,6 lelSart <a<b 
niform nif {a } — a<a< hae 5 1D oo) 
1 x>b 
Bernoulli Bern (p) (1—p)'* p’ (1—p)'* D p(1— p) 1—p+pe* 
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1We use the notation 7(s,x) and I(x) to refer to the Gamma functions (see §22.1), and use B(x, y) and Iz to refer to the Beta functions (see §22.2). 


1.2 Continuous Distributions 


Notation Fx (x) fx (a) E [X] V [X] Mx(s) 
0 u<a 
F ; iat I(a<a <b) a+b (b—a)? ef 684 
Uniform Unif (a, b) me a<a<b hae 5 7) abaay 
al x>b 
Normal N ( a”) O(x) = , p(t) dt d(x) = : exp (w= 1) o exp 4 ps + ce 
Lt, a TE aga Ll Ll 5 
2 11 Ina — (Ina — p)? pto2/2 o2 Quo? 
Log-Normal InN (pu, 0°) 55 erf a oe exp 358 e (e le 
Multivariate Normal MVN (pu, ©) (2m) */? || “4/26 hee a) m xu exp {u's + a, 
r (44 2\ —(v+1)/2 
Student’s t Student(v) Ip (5. *) (=) (1 Ee ) 0 0 
2 Joat (8) 
. 2 1 k x 1 k/2—a/2 —k/2 
Chi-square Xk mun (5) PAT (E/D)* e k 2k (1 — 2s) s<1/2 
[| (diw)41 a9? ; 
dy dy (dja+dg)t1F42 dz 2d3(di + dz — 2) 
F F(d1,d I aye == Sa Pa 
(di, de) on ( 27 2 ) oB (4, 2) dz —2 di (dz — 2)2(d2 — 4) 
Exponential Exp (8) 1—e */8 1-2/8 B B° - (s < 1/8) 
(a, £/B) 1 a-1 —2/B 2 “ 
Gamma Gamma (a, () Ta) T (a) 5 e aB aB i_ Bs (s < 1/8) 
l(a, §) pe -a-1,-8/ B e 2(—Bs)°/? IE 
Inverse Gamma InvGamma (a, (3) l(a) T (a) e€ ho >1 fa-1)a— 22 a>2 T(a) Ka ( a) 
r ope a) . 1(1—-EIX; 
Dirichlet Dir (a) = . II a i i Bole EIAd) 
TiziT (ei) jaa doiar Vier % +1 
foe) k-1 
T(@+ 8) a1 B-1 a ap a+r af 
Bet Bet Te 1 14 ce aN ad 
ee oe) rajr~@” “%~*) atB = (@+ Ba +841) 2 (Uarare) a 
F _ po (#/a)* k em ~(«/)* 1 2 2\ 2 Ss” ( ") 
Weibull Weibull(A, k) l-e rate e AL La a ee at 2 ae ane 
Lm \% xe, AXLm xp, ms 
ms al ey > m => m r ~Lm TA, ~4m 
Pareto Pareto(am, a) 1 (=) cr>2 pati x = >1 (eaves a(—ams)°T(—a, —ams) s <0 
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2 Probability Theory Law of Total Probability 
Definitions “ ‘ 


e Sample space 2 i=l i=l 
e Outcome (point or element) w € 2 
e Event ACO 
e o-algebra A P[A;| B] = a [B | Ai] P [Ai] ae | A 
1Q0ECA Yj=1 P[B| Aj] P [Aj] ; 
2. Ai, Ao,...,EA => UX, AEA 
3. AEA = “ACA 
e Probability distribution P 


1. P [A] > 0 for every A 
2. P[Q] =1 


Bayes’ Theorem 


Inclusion-Exclusion Principle 


J-xeor 


r=1 ii <<ip<n 


Q Ai; 


j=1 


3 Random Variables 


Cumulative Distribution Function (CDF): 


3. P || | 4:i| => P [Ai] 
i=1 i=1 Random Variable 
e Probability space (, A, P) X:02->R 
Properties Probability Mass Function (PMF) 
° P[d]=0 fx(a) =P[X =a] =P[{w €Q: X@) = 2}| 
ease AU aa OR Ane) oes) Probability Density Function (PDF) 
© Pld] =1-P [A] 
e P[B])=P[AN B]+P [An BI [ 
Pla<X <)d]= d. 
e P{Q) =1 P [0] =0 la< < }| ee 
e =(U, An) =, 74An 7, An) =U, 7An DEMORGAN 
A 


e P[AUB] =P[A]+P[B] —P[AnB] Fx :R-> (0,]] Fx (x) =P[X <a] 
=> P[AUB] <P[A]+P[B] 
e P(AUB] =P [AN-7B]+P[-AN B]+P[ANB 
e P[AN-=B] =P[A]-—P[AN 8B] 


1. Nondecreasing: 11 < @2 = > F(a) < F (x2) 
2. Normalized: lim,_,_,., = 0 and lim,;_,,, = 1 
3. Right-continuous: lim,), F(y) = F(z) 


Continuity of Probabilities 


e Ay C Ag C... => limp +ooP [An] =P[A] where A=U, Ai 
e AyD AQ>D... => lim, P[A,] =P[A] where A=()%, Ai 


b 
Plas¥<b|x=a=f fyixtyleidy asd 


Independence IL frix(y|a) = f(x,y) 
AILB <= P[ANB]=P[A]P[B] fx (2) 
Conditional Probability Independence 
1. P[X <2,Y <y]=P[X <a]P[Y <y] 
P[A|B] = P[AnB] 


PE if P [B| >0 2: fxy(2,y) = fx (x) fy (y) 


3.1 Transformations 


Transformation function 


Discrete 


ye 


rep 1(z) 


Continuous 
F(z) =P[p(X) < z] = - f(a)dx with A, = {x: p(x) < z} 


Special case if y strictly monotone 


fale) = fxlo"(@)] 12) = fl) |F| = fxg 
The Rule of the Lazy Statistician 
s[Z|= f ole) dFx(e) 
ate) = f tale) aP x(a) = f dFx(e) =P LX € Al 
A 


Convolution 


eZ:=X+Y fz(z) -[- fxy (a, 2-2) dx *x¥20 [ fuy (a, 2-2) dx 
ae 0 


0 Z:=|X-Y| fz(z) = 2 f fuy (x, 2+ 2) dx 


es / ” Ce adds 


Y falz) = [. \a| fx,y (a, xz) dx — 


4 Expectation 
Expectation 


X discrete 


X continuous 


e 
i] 
S 
Io, 
eat 
cS 

ee) 


Y 
e Ely(Y)] 4 v(E [X}) 


(cf. JENSEN inequality) 
e P(X >Y]=0 = E[X|>E[YJAP[X=Y]=1—>E 
eE[X)=) P[X >a] 
g=1. 


Sample mean 


Conditional Expectation 
EY X =a]= fuftule)dy 
e E[X] =E[E[X|Y]] 
e Ely(X,Y) 


X=2)|= 


oY, Z)|X =a] = f “i Divnxteetiae 
3 [Y |X] +E [Z| X] 

(XY |X] = o(X)E YY | X] 

e BY |X] <c = Cov [X,Y] =0 


p(z, y) fyix(y |x) dx 


e 
= 
> + 
a 
[7 
[ 


5 Variance 


Variance 

eo V[X] =02 = le a [X])?] =E [X?] -E [xP 

eV Sx] 3 XJ +250 Cov [X,Y] 

ifj 
eV ed = SOV [x iff X; IL X; 
i=1 i=1 
Standard deviation 
sd[X] = V [X] =ox 

Covariance 

° Cov [X,Y] = E[(X —E[X])(¥ —E[¥)] =E [XY] -E[X] 

e Cov [X,a] =0 

e Cov [X, X] = V[X] 

e Cov [X,Y] = Cov [Y, X] 

e Cov [aX, bY] = abCov [X,Y] 

e Cov [X +4,Y +b] = Cov [X,Y] 


e Cov Sex. : 7 ee [Xi, Y5] 
i=l J 


j=1 i=1 j=1 


Correlation Cov [X.Y] 
OV 
X,Y] = 
eLOYl = rx 
Independence 
XILY = p[X,Y]=0 <— Cov[X,Y]=0 — E[XY]=E 
Sample variance 
a. . _¥ \2 
Sra dK n) 
Conditional Variance 
e V(Y |X] =E [(Y —E[Y| X})?| X] =E [¥?| X] -E(v | xP 
e V[YJ=E[V[Y|X]]+V[E[Y | X]] 
6 Inequalities 
CAUCHY-SCHWARZ 
[XY]? <E [X?|E [Y?] 
MARKOV IX] 
P [o(X) 2] < —F 
CHEBYSHEV VIX] 
P(X -E(X|| >< 
CHERNOFF ; 
e 
JENSEN 
‘[p(X)] = p(E[X])  ¢ convex 


7 Distribution Relationships 


Binomial 


e X; ~ Bern(p) = S" X; ~ Bin (n,p) 
i=1 
e X ~Bin(n,p),Y ~ Bin(m,p) = X+Y ~Bin(n+™m,p) 


e limp—+oo Bin (n, p) = Po (np) (n large, p small) 


e limn—oo Bin (n, p) = N (np, np(1 — p)) 
Negative Binomial 

e X ~ NBin (1, p) = Geo (p) 

e X ~ NBin(r,p) = )5)_, Geo (p) 

e X; ~ NBin(ri,p) = >> Xi ~ NBin (5 1, p) 

e X ~ NBin(r,p). Y ~ Bin(s+r,p) = P[X <s]=P[Y >7| 


Poisson 


SS" X; ~ Bin (Sa, oy 


j=l j=l j=1 


e Xi ~ Po (Ay) A XG X; = X; 


Exponential 


e X; ~ Exp(6)AX; IL xX; = SX; ~ Gamma (n, £3) 
i=1 
e Memoryless property: P[X >a+y|X >y] =P[X >a] 


Normal 
e X ~N (1,07) 
eo XWN(p,07)AZ=aX+b => ZN (apt+b,a’o") 
e X ~N (111,02) AY ~N (p12, 03) => X+Y ~N (11 + 12,07 + 03) 
ad i Xi ~N (YD Hi, D4 7) 
=O) 
e O(-2)=1-O(2z)  d/(t) =—-aG(x) h(a) = (x? — 1) (a) 
e Upper quantile of V (0,1): zq = ®-1(1—a) 


Gamma 
e X ~ Gamma(a,B) == X/8 ~ Gamma (a, 1) 
e Gamma (a, 8) ~ 0°_, Exp (8) 
e X; ~ Gamma (a;,8) AX; LX; => SO, Xi ~ Gamma (9°; ai, 8) 


7 (a) =| pele Ae dx 
0 


Ao 
Beta 
1 ae 4... AMO B) 4-4 =I 
sap tema ee 
fem B@ER BS) a@4k-1 ; 
aa aeee 


Beta (1, 1) ~ Unif (0, 1) 


(n large, p far from 0 and 1) 


8 Probability and Moment Generating Functions Conditional mean and variance 


© Gx()=E[]  lil<1 t[X|¥] =E[X] + p> (vy - Ely) 
+ M6) = Gxt) = 4] = [> AP" = 5 Jy V[X|Y] =oxV1-(? 
1=0 i=0 
e P[X =0] = Gx(0) . ‘ 
° P[X =1)=G(0) 9.3. Multivariate Normal 
ePixag- Cx (0) Covariance Matrix © (Precision Matrix =~') 
° IX] =G%(1-) V[X1] s+ Cov [X1, X%] 
e E[xX*] =u 0 oe oy 
[ Ps ‘i os 1- ie ae 7 
. rag x) If X ~N (u,5), 
© V[X] =G%(1-) + GX (17) - (Ge (1-))’ i 
° Gx(t)=Gy(t) = X5yY fx(2) = (2m)-"/? |D|-/? exp {30 =p) ae w 
Properties 


9 Multivariate Distributions 
(O,.IVAX =p+d¥?2Z — X~N(u,) 

(4,5) => U-V2(X — p) ~N (0,1) 

(ux) => AX ~ N (Ap, ADAT) 

(4,2) Aa is vector of length k => a? X ~ N (ap, a™Xa) 


9.1 Standard Bivariate Normal e 
Let X,Y ~N(0,1)A X IL Z with Y =pX + V/1—p?Z 


Joint density 


1 ge? ty? — cpeu | 
x,y) = ex 
Mea) = a pf 1 p) 10 Convergence 


Conditionals Let {X 1, X2,...} be a sequence of Rv’s and let X be another rv. Let F,, denote 
the CDF of X,, and let F' denote the CDF of X. 
(Y|X=2)~N(px,1—p?) and (X|Y=y) +N (py,1— p*) 


Types of Convergence 


Independence 
XILY <=> p=0 1. In distribution (weakly, in law): X, a 
; : lim F,,(t) = F(t) Vt where F' continuous 
9.2. Bivariate Normal nee 
Let X ~N (2,02) and Y ~ N (py, 02). 2. In probability: X, > X 
1 e (Ve >0) lim P[|X, —X|>e]=0 
f(x,y) 5 exp x1 2 \ n—300 
2TozOy\/1— p (1 67) Se 
3. Almost surely (strongly): X, 4 X 
2 2 
=(( Hs) (! Hn) 2» (= Hs) (! tn) P [ lim X, =X] =Pl|wea: lim X,(w) = X(w)| =1 
Or Oy Cx Oy n—0o noo 


4. In quadratic mean (L2): X, > X CLT Notations 


lim E [(X, — X)?] =0 Zn = N (0,1) 
noo _ ge 
Relationships n 
eX, SX SS Xn OX |S XX X,- new (0,5) 
as P zt 
°X, 5X = X,7X : Vn(Xn — pb) & N (0,07) 
eX, > XA (AcER)P[IX=c=1 = X%, 7X ee) 
eR RAY OY SS XV Oey ND) 
eX, BX AY, SY —] By, Seay 
© Xn 3 XAV_ 9 VY => Xn¥Yn 3 XY 
eX, 3X = (Xn) > 9X) Continuity Correction 
0 XX => (Xn) > 9(X) P(X, <a] xo at+i-u 
eX, 3b <= lim,5. E[X_] = dA limy+0 V [X,] = 0 2 a/\/n 
e X1,...,Xn IDAE[X]=pAV[X])<w = X, 3 u 4 


= a3 — hb 
Slutzky’s Theorem 7 a//n ) 
Delta Method 
e X, > X and ¥, De => Xnt+WYr3X+e P 
eX, 3X and Y, 3c => XpYn 3 cX Yn = N (1 ~) => (Yn) &N (ow (y'(n))° “) 
e In general: X, 3 X and Y,» BY Xn+¥%e BX+Y am 


11 Statistical Inference 
eae i aan, ced aan) Let X1,--- ,Xn ‘ F if not otherwise noted. 


Let {X1,...,Xn} be a sequence of 11D RV’s, E [X,] = ps, and V [Xq] < ow. 


11.1 Point Estimation 
Weak (WLLN) 


> eer Ba Sar tts e Point estimator On of 0 is a RV: On = 9(X,...,Xn) 
e bias(0,,) =E | -—6 


Strong (SLLN _ 
Et Consistency: 0, —> @ 


Sampling distribution: F’ (,,) 


10.2 Central Limit Theorem (CLT) © Standard error: se(@n) = 4/¥ [dn 
Let {X1,...,Xn} be a sequence of 1D RV’s, E[Xy] = p, and V [Xq] = o?. e Mean squared error: MSE = E | (4, — 6)? = bias(9,)? + V [6 
7 _ e limy—soo bias(0,.) =0A limn-s06 se(0,) =0=> oe is consistent 
—_ (ee a Vn(Xn — HL) » BS 
Ly = >Z where Z ~ N (0,1) . OIn—-8 d 
V [Xn] o e Asymptotic normality: = N (0, 1) 


e SLUTZKY’S THEOREM often lets us replace se(0,,) by some (weakly) consis- 


lim P[Z, < z] = ®(z) zeER tent estimator Gy. 
noo 10 


11.2 Normal-based Confidence Interval 11.4 Statistical Functionals 


Statistical functional: T(F’) 

Plug-in estimator of 6 = T(F) : On = range 
Linear functional: T(F) = [ (x) dFx (2) 
Plug-in estimator for linear functional: 


Suppose 0, © N (0,8). Let Zas2 = ®-1(1 — (a/2)), ie, P[Z > zas2] = o/2 
and P [-Za/2 <Z< Za/2 = 1-—a where Z ~ N (0,1). Then 


C024 Ze /25€ 


T(Fy) = f ole) dPa(z) =~) ol%) 


11.3. Empirical Distribution Function ara 
Empirical Distribution Function (ECDF) © Often: T(F,) &N (T(F), %”) = T(E) + 2/98 
th F 4 p 
= © TX = e p'™ quantile: F~*(p) = inf{ax : F(a) > p} 
F(x) isl ( z) efj=X 
n H= An 
1 < ‘ 
oe = S0(Xi — Xn)? 
12s 1 X;<2z Ce re 
tS tT) = n 
0 X;>2 a= 4 ane, © ae 
= 3; 
Properties (for any fixed x) . ye (Xi — Xn) (Vi — Yn) 
ep= = 
aS = X; — Xn)? ue Yi-Yn 
+5 [A] =F Vy Dihe (Xi — Xn haa (Wi — Fa) 
- F(a)1—- F(a : 
eV [F| FO - (@)) 12 Parametric Inference 
F 1-F 
e MSE = (x) (2) 0 Let § = { f(a; :0€ oO} be a parametric model with parameter space © C R* 
oF 3 F(a) a and parameter 6 = (61,..., 0x). 
DVORETZKY-KIEFER-WOLFOWITZ (DKW) Inequality (X1,..., Xn ~ F) 12.1 Method of Moments 
j*® moment 
A ayy 2 ; : 
P [sup| Fe) - Fy(e)] > e| = 2 0,(0) =B[X!] = f 2 dFx(2) 
Nonparametric 1 — a confidence band for F j™ sample moment 


Io; 
‘ a;=— > XxX? 
L(x) = max{ Fy — €n, 0} 2 ee A 


U(x) = min{ Fy + en, 1} Method of Moments Estimator (MoM) 
Ar 2 P 
€=4/ = log (2) ay(0) = ay 
2n a - 
a2(0) = a2 
P (L(x) < F(a) < U(x) Va] >1-a apz(O) = ay 


~ i ~ 11 


Properties of the MoM estimator 


e On exists with probability tending to 1 
e Consistency: On +0 
e Asymptotic normality: 


n~ 


Vn(6 — 0) > N (0,5) 


where © = gE [YY7] g?, ¥ = (X, X?,...,X*)*, 
9 =(91,---, 9k) and gj = Sa; '(6) 


12.2) Maximum Likelihood 
Likelihood: L,, : © — [0, 00) 


Log-likelihood 


t(8) = log £(8) = Sloe F(X::8) 


i=l 


Maximum Likelihood Estimator (MLE) 


Lr (On) = sup Ly (8) 
0 


Score Function 


9X30) = F tog f(X:0) 


Fisher Information 
I(9) = Vo [s(X; 9)] 


I,(0) = nI(0) 


Fisher Information (exponential family) 


O 
I(0) = Eg |—-—s(X;0 
(0) = Ba |-Zocxs0)| 
Observed Fisher Information 


eae 
12°5(g) = ST) Slog f(Xi3 9) 
i=l 


Properties of the MLE 


e Consistency: On > 0 


e Equivariance: 6, is the MLE => (On) is the MLE of (0) 


e Asymptotic normality: 
1. sex \/1/I,,(9) 


(On — 8) 
se 


3 N (0,1) 


e Asymptotic optimality (or efficiency), i.e., smallest variance for large sam- 


ples. If 6, is any other estimator, the asymptotic relative efficiency is 


ARE(0n, On) = 


V {6,| 
e Approximately the Bayes estimator 


12.2.1 Delta Method 
Ifr= AC) where ¢ is differentiable and y’(0) 4 0: 


(a=7) ® 9,1) 


se(T) 


where 7 = (0) is the MLE of 7 and 


12.3. Multiparameter Models 
Let 6 = (0;,...,0%) and 6 = (6;,...,0,) be the MLE. 


sO ie OE 
I “6g? iB 00;80;, 


Fisher Information Matrix 


In(@) ae 


Under appropriate regularity conditions 


(0-0) = N (0, Jn) 


al 


to [M11] -:- Eo [Aix] 


7) [Heal] ; : 7) aad 
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with Jn (0) = I; '. Further, if 0; is the j** component of 0, then 13. Hypothesis Testing 


. Hy :0€ Qo versus HA, :9E€Q, 
(0; — 9;) 2B N (0,1) Definitions 
e Null hypothesis Ho 

ae oe Alternative hypothesis H, 
where se = Jn(j,j) and Cov [a;, 0 | = In(j, k) e Simple hypothesis 6 = 69 

e Composite hypothesis 0 > 69 or 6 < 4% 
Two-sided test: Hp: 9=06) versus H,:04 95 
One-sided test: Hp: 9<6@ 9 versus Hy,:0> 95 
12.3.1 Multiparameter Delta Method e Critical value c 

e Test statistic T 
Rejection Region R = {x : T(x) > c} 
Power function 6(0) = P[X € R] 


se; 


Let 7 = y(61,...,0%) be a function and let the gradient of y be 


dy e Power of a test: 1 — P [Type II error] = 1— 6 = gint B(@) 
eal €O1 
0; e Test size: a =P [Type I error] = sup 6() 

Vge= , 0€Oo 
og | Retain Ho Reject Ho 
OO; Ho true J Type I error (a) 


Hy, true | Type II error (8) V/ (power) 


x p-value 
Suppose V¢|o-9 # 0 and 7 = y(@). Then, 


e p-value = supgee, Po [T(X) > T(x)] = inf{a: T(x) € Ra} 


(Pen) e p-value =supgeo, Po [T(X*)>T(X)] =inf{a:T(X) € Ra} 

“3 D v 

a(n) > N (0,1) 1—F9(T(X)) since T(X*)~Fo 
p-value evidence 

where < 0.01 very strong evidence against Ho 
0.01—0.05 strong evidence against Ho 

a, OP ay 0.05—0.1 weak evidence against Ho 
se(T) = (ve) Jn (Ve) > 0.1 little or no evidence against Ho 


Wald Test 


i s e Two-sided test 
and J, = Jn(@) and Vy = Vo|o7 6 — 0p 
e Reject Ho when |W| > 2/2 where W = a 


e P||W| > za/2] 3 @ 
e p-value = Pg, [|W| > |w|]  P [|Z] > |w|] = 26(—|w]) 


12.4 Parametric Bootstra 
ae rap Likelihood Ratio Test (LRT) 


Sample from f(z; On) instead of from Bs, where On could be the MLE or method © T(X) = suppee L£n(P) __ Ln(On) 


of moments estimator.  supgeo, L£n(8) Ln(On,0) ra 


k 


© MX) = 2logT(X) > x2_, where > Z? ~ x3 with Z1,...,Z~ ~ N (0,1) 


i=l 


e p-value = Pg, [A(X) > A(x)] © P [x?_, > A(2)] 


Multinomial LRT 


e Let p, = (= 74) be the MLE 
n n 
Ln (Pn) (2 i 
e T(X)= = — 
(*) Ln (po) Ul Poj 
k nA 
© A(X) =25~ Xj log (2) oes 
j=l 


e The approximate size a LRT rejects Ho when A(X) > Xz-1.4 


Pearson y? Test 


X; —E[X;,])? 
T= S- 2 Bele where E [X,;] = npoj; under Ho 


e p-value = P [x?_, > T(2)] 

e Faster 3 X?_, than LRT, hence preferable for small n 
Independence Testing 

e I rows, J columns, X multinomial sample of size n = I * J 


Xi 
n 


¢ MLEs unconstrained: p,; = 
n n 
© LRT: A= 277-1 j_1 Xiz log (x44) 

2 
e Pearson x7: T = ue ee geal 


e LRT and Pearson > y?v, where v = (I — 1)(J —1) 


e MLEs under Ho: poij = fi-p.j = 


14 Bayesian Inference 


BAYES’ THEOREM 


_ fle\OFO) __Fle|)F@) 
FON2) = Fan) = THe] a) f(a aa * FO) 


Definitions 


eX" =(X1,...,Xn) 


ev" = (41,...,2%n) 


e Prior density f(6) 
Likelihood f(x” | 0): joint density of the data 


In particular, X" ub => f(#”|@) = TL x; |0) = Ly, (9) 


Posterior density f(0| 2”) 
Normalizing constant c, = f(x") = f f(x|0)f(0) dé 
Kernel: part of a density that depends on 6 


Posterior Mean 6, = [ 0f(0| a”) dd = feea oso, 


14.1 Credible Intervals 


1 —a Posterior Interval 

P [6 € (a,b) | a2") = [ F012") d0=1-a 
1 — a Equal-tail Credible Interval 

i F(@|2") do = f f(0|2")d0 = a/2 


1 — a Highest Posterior Density (HPD) region R, 


1. P[@eR,]=1l-a 
R, ={0: f(0|x2”) > k} for some k 


R, is unimodal => R,, is an interval 


14.2 Function of Parameters 


Let 7 = (0) and A= {0: y(0) <r}. 
Posterior CDF for Tt 


H(r|2") =P [v(0) <7 | 2] =f s(o\2") a6 


Posterior Density 
h(r |”) = H"(r| 2”) 


Bayesian Delta Method 


T|X” aN (90), 


2'())) 
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14.3 Priors Continuous likelihood (subscript c denotes constant) 
Likelihood Conjugate Prior Posterior hyperparameters 
Choice 
Uniform(0, 0) Pareto(am, k) max {X(n),2m},k+n 
e Subjective Bayesianism: prior should incorporate as much detail as possible Exponential(A) | Gamma(a, 8) a+tn, B+ 2 i 
the research’s a priori knowledge — via prior elicitation. = 
e Objective Bayesianism: prior should incorporate as little detail as possible n 
: : ; 2 2 Mo, dons Ti 1 
(non-informative prior). Normal(u,o2) | Normal(jio, 09) 3 3 ae as 
sie : ‘ ; F ies 9% Oo %% 
e Robust Bayesianism: consider various priors and determine sensitivity of 1 Pree 
our inferences to changes in the prior. (= + "| 
a) oe 
2 n hie 
Types Normal(ji.,07) | Scaled Inverse Chi- | v +n, Yet Laie) 
2 v+n 
square(V, 06) 
e Flat: f(@) « constant + nz 
I is Normal(,07) | Normal- Bee y+n, 
e Proper: f-.. f(0)d0=1 ven 
os scaled Inverse 1 7) 
© Improper: [°° f(0) d0 = co Gamma(,v,0,8) | 6 +23 (a — 2)? +12) 
e JEFFREYS’ prior (transformation-invariant): 2 aa 2(n +7) 
MVN(p, Ee MVN (p19, 5 Dot + nEz!) "(Dpto + nua tz 
f(0) < JT) — (0) « det TC) 7) a am cal ee a 
(25 +ndn ) 
e Conjugate: f(@) and f(@| 2”) belong to the same parametric family MVN (jie, 5) es eee Sai Pace 
Wishart(k, ¥) i=1 
14.3.1 Conjugate Priors Pareto(tm,,k/) | Gamma(a, 3) atn, B+ ye log at 
Lin 
i=1 e 
Discrete likelihood Pareto(@m,k-) | Pareto(xo, ko) Xo, ko — kn where ko > kn 
Likelihood Conjugate Prior | Posterior hyperparameters Gamma(@e, 8) | Gamma(ao, So) Q9 + NA, Bo + DS Xi 
n n t=1 
Bernoulli(p) Beta(a, () a+ ys xji,bB+n—- se Xi 
i=] a 14.4. Bayesian Testing 
Binomial(p) Beta(a, () a+ ye xi, b+ S- N;- ye Xi If Hyp :0 € Qo: 
i=1 i=1 i=1 
Negative Binomial(p) | Beta(a, 3) atrn, B+ S- Lj Prior probability P [Ho] = f(8) dé 
: Go 
w=1 
Poisson(A) Gamma(a, 8) ape S- ai,8+n Posterior probability P [Ho | «”] = ay f(0|x2") do 
D0 
i=1 
Multinomial(p) Dirichlet (a) a+ S x() 
Cas Let Ho,...,HK_1 be K hypotheses. Suppose 0 ~ f(6| Hy), 
Geometric(p) Beta(a, () atn, B+ os Lj P [Hy |2"] = je | H;,)P [Hx] . 
i=1 Depa f(x” | Hx)P [Hic] 


Marginal Likelihood 


f(e" | Hi) = [ fle 


Posterior Odds (of H; relative to H;) 


"|0, Hi) f(0 | Hi) do 


PIM le") _ f(a" | Hi) PL 
P [H; | 2] f(z” | H;) P [5] 
ee ee oe” 
Bayes Factor BF;; prior odds 
Bayes Factor 
log, BFio BFio evidence 
0-—0.5 1-—1.5 Weak 
0.5—1 1.5—10 Moderate 
1-2 10—100 Strong 
>2 > 100 Decisive 
,__FBFo 


P 1+ BR 


15 Exponential Family 


Scalar parameter 


where p = P [Hj] and p* = 


fx (a |) = h(a) exp {n(@)T(@) — A(A)} 
= h(x)g(0) exp {n(9)T(x)} 


Vector parameter 


fx (| 0) = h(x) exp 1S ni(@)Ti(x) — ACO) 


= h(x) exp {n(0)-T(a)— A 
= h(x)g() exp {n(9) - T(a) 


Natural form 


fx(x|n) = 


h(x) exp {n- T(x) — 


“~ 
D 
CS 
Kw 


A(n)} 


= h(x)g(n) exp {n- T(2)} 


= h(x)g( 


16 Sampling Methods 


16.1 The Bootstrap 


Let T, = 9(%,...,Xn) be a statistic. 


n)exp {ny T(a)} 


P(A | 2”) 


1. Estimate Vp [T,,] with Vg [Th]. 


2. Approximate V, [T;,] using simulation: 


(a) Repeat the following B times to get Ty ,,..., 7% 3, an ID sample from 
the sampling distribution implied by F, 


i. Sample uniformly AX7,...,XF ~ F.. 
ii. Compute T* = g(Xf,..., X%). 
(b) Then 


is 


B 12 4 
went oS (te Sn) 
b=1 r=1 


16.1.1 Bootstrap Confidence Intervals 


Normal-based Interval 


Tn E Zq/2 SChoot 
Pivotal Interval 


. Location parameter 0 = T(F) 

. Pivot Rn = On — 8 

. Let H(r) =P[R, <r] be the cpr of R, 
he R= 6, 


Ee wN 


- On. Approximate H using bootstrap: 
12 
Hr) = 5 MR $7) 


pee ile ae) 
., Rig), Le, rg = 05 — B, 


5. Let 05 denote the 8 sample quantile of (Ga 
6. Let rj denote the @ sample quantile of (ay 


7. Then, an approximate 1 — a confidence interval is C;, = = (a, b) with 


a= Bn -H (1 = =) = Bn Sie Sees = 26, i OT a/2 
7, Q ry a a * iy * 
= hee (S) = On — Tj = 20 — O% jo 


Percentile Interval 


Cn = ( 5/091 —aja) 
16 


16.2. Rejection Sampling 
Setup 


e We can easily sample from g(6) 
e We want to sample from h(@), but it is difficult 


e We know A(@) up to proportional constant: h(6 = Tao) a Ka 


e Envelope condition: we can find M > 0 such that k(0) < MW - ) 


Algorithm 


1. Draw 9°"4 ~ g(8) 
2. Generate u ~ Unif (0, 1) 


RpeAPe) 
A £ 9rd if yY< —~—__ 
3. Accep ifu< Mg(o-nd) 


4. Repeat until B values of 0°¢"@ have been accepted 
Example 
e We can easily sample from the prior g(@) = f(9) 
e Target is the posterior with h(0) « k(@) = f(«” ) 
e Envelope condition: f(a” |0) < f(2"|0n) = L(O,) = M 
e Algorithm 
1. Draw 9°2"4 ~ f (0) 


2. Generate u ~ Unif (0,1) 
a4 Ee) 
3. Accept 0°" if u< ——— 
L£n(On) 


16.3. Importance Sampling 


Sample from an importance function g rather than target density h. 


Algorithm to obtain an approximation to E [q(@) | 2”): 


tid 


On ~ f(8) 


2. For eachi = 1,...,B, calculate w; = 


3. E[q(9) |2"] © 22, a(6:)ui 


1. Sample from the prior 6},.. 


17 Decision Theory 
Definitions 


e Unknown quantity affecting our decision: 0 € O 


e Decision rule: synonymous for an estimator 


e Action a € A: possible value of the decision rule. In the estimation 
context, the action is just an estimate of 0, 0(x). 


e Loss function L: consequences of taking action a when true state is @ or 
discrepancy between @ and 6,L:0xA> [—k, oo). 


Loss functions 


e Squared error loss: L(0,a) = (0 — a)? 

Kki(@-a) a-0<0 

K2(a—0) a-@>0 

e Absolute error loss: L(0,a) =|#—a| (linear loss with Ky = K2) 
L, loss: L(@,a) = |0 — al? 


e Linear loss: L(6,a) = 


0 a=@ 


e Zero-one loss: L(0,a) = 
1 a#é 


17.1 Risk 
Posterior Risk 


(Frequentist) Risk 


20,6) = | 1(0,8(«)) f(2| 8) dx = Bx [L(6,())| 


Bayes Risk 


(f,6) = / | L(6,6(x)) f(x, 0) dx dd = Eo.x [2(0, 8(X))] 


r(f,9) = Eo [Exo [L(0,0(X)|] = Eo | R(0,6)| 
r(f,9) = Ex [Eqix [£(0,0(X)|] =Ex [r@1X)] 


17.2 Admissibility 


e 6’ dominates 6 if 


VO: R(0, 0") < R(6,8) 
30 : R(O, 6") < R(0, 8) 


Pa) is inadmissible if there is at least one other estimator @’ that dominates 
it. Otherwise it is called admissible. 
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17.3. Bayes Rule 
Bayes Rule (or Bayes Estimator) 


n~ ~ 


(f,0) = infgr(f, 9) 


n~ 


e (x) =infr(O|x) Va => r(f,0) = fr(O|x) f(x) dx 


Theorems 


e Squared error loss: posterior mean 
e Absolute error loss: posterior median 
e Zero-one loss: posterior mode 


17.4 Minimax Rules 


Maximum Risk 


- 4 n~ 


R(6) = eup R(0, 0) R(a) = ny R(6,a) 


Minimax Rule 


sup R(0,0) = inf R(0) = inf sup R(6, 0) 
6 6 6 0 


n~ 


0 = Bayes rule A Ic: R(6,0) =c 


Least Favorable Prior 


6f = Bayes rule A R(0,0°) < r(f,0") V6 


18 Linear Regression 


Definitions 
e Response variable Y 


e Covariate X (aka predictor variable or feature) 


18.1 Simple Linear Regression 


Model 


Y¥; = Bo+ Bixi +e «ex | Xi] =0, V le | Xi] = 0? 


Fitted Line gh 
P(x) = Bo + Bix 
Predicted (Fitted) Values 


Residuals 


Residual Sums of Squares (RSS) 
RSS(Bo, 21) = e 


Least Square Estimates 


BoB. 

Bo = Y; = Bi e 

a we (Xi — Xn)(Vi- Yn) — SL XY; — XY 
: 7 (XG — Xale SX? nk? 


NSxX 
oe Gia X? 
¢(Bo) ica sxV/n A 
Anes o 
se(81) = axVn 


where sx =n~! 7, (X; — X,)? and G? = 4, S07, @ an (unbiased) estimate 
of o. Further properties: 


e Consistency: Bo => Bo and Bi > Bi 
e Asymptotic normality: 


Bo — Bo 2,.N (0,1) and tet) 3 N (0,1) 
5e( Go) sé(1) 


e Approximate 1 — a confidence intervals for 89 and (, are 


o~ n~ 


Bo = Ze,/288(Bo) and 8, + Zq/288(91) 


e The Wald test for testing Ho : 6; =0 vs. Hy : 8; #0 is: reject Ho if 
|W| > za/2 where W = 8, /5€(8;). 


R? 


pie: Sek a) me ead ef & RSS 
Cera 


ii(¥i ~¥) 18 


Likelihood 


L=][f(%.,¥i) =L,x Lo 


i=l 


Li=|[fx(x 
w=1 


Ly = II fy|x (Vi | Xi) 


i=l 


= [[ fx) x [fix | Xi) 
i=1 i=1 


oes wa8 L(Y (Bo — &iX | 


Under the assumption of Normality, the least squares estimator is also the MLE 


18.2. Prediction 


Observe X = x, of the covarite and want to predict their outcome Y,. 


¥. = Bo + Bit 
Vv %| = Vv [Bo] + av [2] + 22,Cov [40.81| 


Y=XGt+e 
where 
Xii +++) Xk fon €1 
x=] : ; p= |: a (es 
Xni aa Xnk Br En 
Likelihood 


1 
= 2\—n/2 _ 
L(p, &) = (2707) exp { Ig2 nss} 


N 


= 0% — #78)? 


i=l 


XB)" (y— XB) = ||¥ — XBIP 


RSS = (y — 


If the (k x k) matrix X7 X is invertible, 
B= (XTX) 1XTY 
V [3 |x” ]=e 2(xTx)-1 
BN (B,07?(X7X)) 


Estimate regression function 


j=l 
Unbiased estimate for o? 
1 n 
reer e é=X8-Y 
i=1 
MLE i 
a ee 
‘ n 


1 — a Confidence Interval 


8; cn Zo/258(B;) 


18.4 Model Selection 


Consider predicting a new observation Y* for covariates X* and let S Cc J 
denote a subset of the covariates in the model, where |S| = & and |J| =n. 
Issues 


e Underfitting: too few covariates yields high bias 
e Overfitting: too many covariates yields high variance 


Procedure 


1. Assign a score to each model 
2. Search through all models to find the one with the highest score 


Hypothesis Testing 
Hy: 6; =O vs. H,: 8; #0 WET 


Mean Squared Prediction Error (MSPE) 


3 |(F(s)-Y*)’] 


MSPE = 


Prediction Risk 


= vr =S y 


ca 19 


Training Error 


R?2 
, rss(S)_ Ri (S) re (hi(S) -— ¥)? 
Fe gs TSS Sane oe 


The training error is a downward-biased estimate of the prediction risk. 


; [Rir()| < R(S) 


: | R,-($)] — R(S) = -2 ‘ Cov [%2,¥1| 


i=l 


bias(Re,(S)) = 


Adjusted R? 


MALLOW’S C, statistic 


R(S) = Ry(S) + 2kG? = lack of fit + complexity penalty 
Akaike Information Criterion (AIC) 
AIC(S) = £,(83,6%) —k 


Bayesian Information Criterion (BIC) 


a k 
BIC(S) = €n(Bs,6%) — 3 logn 
Validation and Training 
Ry(S)= ere -y;) m = |{validation data}|, often ; or 5 
i=1 


Leave-one-out Cross-validation 


2 
nm n ¥,(S) 
jenn (¥%; — Pn)? 
ov(8) = 2H ~ Fey => (FG c | 
U(S)= X5(X§Xs) 1 Xs (“hat matrix” ) 


19 Non-parametric Function Estimation 


19.1 Density Estimation 


Estimate f(x), where f(r) =P[X € A] = J, f( 
Integrated Square Error (ISE) 


Lf fa) = f (f(@)- fala) de = JH) + wal F(a) de 


Frequentist Risk 


RE LoS 


19.1.1 Histograms 


Definitions 


e Number of bins m 

e Binwidth h = 4 

e Bin B; has v; observations 

e Define p; = v;/n and p; = Je, f(u) du 


Histogram Estimator 


Rh D* ap C= em (/ (f"(u))? in) ” 


Cross-validation estimate of E [J(h)} 
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19.1.2 Kernel Density Estimator (KDE) 
Kernel K 

e K(x) >0 

° [ K(2) dz =1 

e { K(x) dx =0 

e [2?K(x)dv =o% > 0 


KDE 


1 
A 1 1 
RUF Aa) ® Glhony! | (¢"(@))? de + f (0) de 
Peg 


h* = Sr ee cy = OK, C2 = [Po dz, c3 = fr dx 
n 


Ri h)= te a= S27 ( i (a) ws)" [urea 1/3 
. 


Epanechnikov Kernel 


3 
K(x) = ¢ 4V50-2?/5) Ie] < v5 
0 otherwise 


Cross-validation estimate of E [J(h)} 


Jovin) = [ Rede? Fo) * a De (AS) + Seo 


19.2 Non-parametric Regression 


Estimate f(x), where f(a) =E [Y | X = a]. Consider pairs of points 
(21,V1),---,(@n, Yn) related by 


(xi) + €: 


ile 
Vie 


ra 
=0 
ae 


1 
il 


k-nearest Neighbor Estimator 


Tr(a) == S- Y; where N;,(x) = {k values of 71,...,2, closest to x} 
i:0;, EN, (x) 


Nadaraya-Watson Kernel Estimator 


R(Fa,r) © ie ( / a? K(x) i) / (" + 2r! (a) ON dex 


; [ose 
| nhf (x) 
ra 
nile 
Pe c 
R*(P_,7) & 15 


= | T = ; 2 
2 
i=l i=1 K(0) 
1l- a oay 
Geico) 


19.3. Smoothing Using Orthogonal Functions 
Approximation 

ioe) J 

£) = LS Bibs (@) © SS B50; (2) 

j=l i=1 

Multivariate Regression 
Y=66+y7 

go(t1) +++ bs(21) 


where 7; =e; and ®= a 
boltn) +++ s(t) 
Least Squares Estimator 
8 = (076) 1@TY 
~~ “ary (for equallly spaced observations only) 
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Cross-validation estimate of E [J(h)} 
2 
me n J 
Rov(J) =>) (%- do eed 8o 
i=1 j=l 


20 Stochastic Processes 


Stochastic Process 


{0,+1,...} =Z_ discrete 


(XteT} r= 


[0, co) continuous 

e Notations: X;, X(t) 

e State space V 

e Index set T 
20.1 Markov Chains 
Markov Chain 

P[X, =2|Xo,..-,Xn-i1] =P [Xn =2|Xy_-1] Vn eT, rex 
Transition probabilities 
Dig = P[Xng1 = 5|Xn = 4] 
pi (n) =P [Xmin =j|Xm =i] n-step 


Transition matrix P (n-step: P,,) 


e (i,j) element is p,; 
epi; > 0 
© De Pig =1 


CHAPMAN- KOLMOGOROV 


pig ( m+ n) > Pig (m prj (n 
Pinta, = P,P», 
P, =Px.::-x P=P” 
Marginal probability 
Jin = (Hin(L)y++-s#n(N)) where pii(é) =P [Xn = 7) 
io = initial distribution 
Ln = HoP” 


20.2 Poisson Processes 


Poisson Process 


{Xt 


: t € [0,00)} — number of events up to and including time t 


Xo =0 
Independent increments: 


Vto <-++ < ty: Xt, - X,, IL: » tt Xy, — Xt, “4 
e Intensity function X(t) 
— P(X, — Xt = 1) = A(tH)h + o(h) 
— Pp [Xt+n — Xt = 2] = o(h) 
© X.41— Xs ~ Po(m(s+t)—m/(s)) where m(t) = fi A(s) ds 
Homogeneous Poisson Process 
A(th =A’ => X,~ Po (At) A>0 
Waiting Times 
W; := time at which X; occurs 
1 
W, ~ Gamma (« 7 
Interarrival Times 
St = Wi41 — Wt 
1 
S; ~ Exp (5) 
St 
————_—— 
Wit Wi t 
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21 Time Series 


Mean function 


pr Bled =f ef(a)ae 


Autocovariance function 


Valea t) =E [(as _ Hs) (®t a Lt) =E [v.24] — Bbsbt 


x(t, t) = E [(xe — me)?] = V [xe] 
Autocorrelation function (ACF) 


= Cov [25,24] = 4(s, t) 
VV [zs] V [zt] V/7(s, 8) v(t, €) 


Cross-covariance function (CCV) 


p(s,t) 


Yay (s, t) =E [(r. — we.) (Ye — Ly, )] 
Cross-correlation function (CCF) 


Yay (8, t) 


De Tee ay Ee 


Backshift operator 
B * (a4) = Ut-k 


Difference operator 
Vi = (1- B)4 

White Noise 

ew, ~ wn(0,02,) 

e Gaussian: w; ~! N (0,02,) 

ek [we] =0 teT 

e Viwj=o7 teT 

° Ww/(s,t)=0 sAtAs,teT 


Random Walk 
e Drift 6 
ex = ott i W3 
ek [r+] = ot 


Symmetric Moving Average 


k k 


m= y AjLt—j where a; = a_; > 0 and y aj=1 
gee j=—k 


21.1 Stationary Time Series 


Strictly stationary 


Pi 2 iy cade Ce) =P Wa & Crys he hia See] 


VEEN, te, cr, hE Z 


Weakly stationary 


eElz?]}<o weZ 
e i lay] =n VteZ 
o e(st)=e(s+nt+r)  Vrs,teZ 


Autocovariance function 


e y(h) =E [(tt+n — w) (te — p)] VhEeZ 
e 7(0) =E [(x — »)?] 

e 7(0) = 0 

e 7(0) = |y(r)| 

e y(h) = 7(-h) 


Autocorrelation function (ACF) 


Cov [Citn; x4] y(t + h, t) y(h) 


pa(h) = JVitunl Vie 7(t+ht+h)ytt) 70) 


Jointly stationary time series 


Yey(h) = E [(t1+n — He) (Ye — Hy) | 


Yay (h) 


Poul) = Taya, (8) 


Linear Process 


oo oo 
ty= pet x Wjwe-j where S- |w,| < c 


j=—co j=—00 


yh) = 0% So bytndy 


j=-00 
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21.2 Estimation of Correlation 


Sample mean 


Sample variance 


h=—-n 
Sample autocovariance function 
1 n—-h 
y= S > (etn — 2) (a4 — 2) 
t=1 
Sample autocorrelation function 
fe Yh) 
ath) == 
7(0) 
Sample cross-variance function 
1 n—-h 
Yay (h) = re So (14h — Z)(yt — 9) 
t=1 
Sample cross-correlation function 
Jay (h) 
Pry(h) = = — 
Ya (0) (0) 


Properties 


lls; ; F ‘ 
—— if x; is white noise 


Ls : ; : 
— if x, or yz is white noise 


. Pry (h) re Jn 
21.3. Non-Stationary Time Series 
Classical decomposition model 
Ly = Me + St + We 


© py, = trend 
e s, = seasonal component 


e w; = random noise term 


21.3.1 Detrending 


Least Squares 


1. Choose trend model, e.g., 44 = Bo + Git + Bot? 


2. Minimize RSS to obtain trend estimate fiz = Go + Bit + Bot? 


. A . 
3. Residuals = noise w; 


Moving average 


e The low-pass filter v; is a symmetric moving average m; with a; = 


1 k 
v= oe 2 


i=—k 


1. 
2k+1° 


e If Seat pyr wr; © 0, a linear trend function p, = Bo + it passes 


without distortion 
Differencing 


et = Pot fit == Var= Pi 


21.4 ARIMA models 


Autoregressive polynomial 


$(z) =1— d,2—*-+ — baz, zECAG, #0 


Autoregressive operator 
O(B) =1- @1B-+-+~ 6)BP 


Autoregressive model order p, AR (p) 


Ee = O21 +++ +bpXtptu, —> O(B)a, =v 


AR (1) 


k-1 oo 
eu b* (x4_x) + > ob! (wij) k-+00,/6|<1 si $} (wy_;) 


j=0 j=0 


+-_)=S>—"“— 


linear process 


e Ele] = O25 HE [w-j]) = 0 


2h 


e y(h) = Cov [ri+n, 24] = ae 
© o(h) = 3) = 8" 
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Moving average polynomial 
O(z) =14+ 0124+ +--+ O92q zE€CAG, £0 
Moving average operator 
0(B) =14+6,B+---+6,B? 
MA (q) (moving average model order q) 


Lp = We + wpa + + Ogui-g <> Tt = O(B) Ur 


© [x1] = >) jE [wis] = 0 


j=0 


2 7t* 6,0; <h< 
yh) = Cov [s444; 2] = ie dj=0 FPj+n OShSG 


0 h>q 
MA (1) 
ty = wz t+ Owt_1 
(1+6?)o2 h= 
y(h) = § 602, h=1 
0 h>1 
at) = | 
ARMA (p, q) 


Le = P11-1 +--+ + hpLt—-p + we + O1we-1 +--+ + Ogwi—q 
o(B)a, = 0(B) wz 

Partial autocorrelation function (PACF) 

° geo * regression of x; on {xp,-1,@p—2,---, 21} 

© dnn = corr(rpn — ae} ao _ ah) h>2 

e E.g., 11 = corr(x1, 20) = p(1) 
ARIMA (p, d, ¢) 

V'x, = (1 — B)42, is ARMA (p,q) 
o(B)(1— B)*a, = 0(B)ywr 

Exponentially Weighted Moving Average (EWMA) 


Ly = Lt-] + Wt — AW 1 
7B Sod _ AAP pss +w, when |A| <1 
j=l 


Seasonal ARIMA 


e Denoted by ARIMA (p, d, q) x (P, D,Q), 
e 6p(B°)d(B) VE Vix, = 6 + OQ(B*)0(B)ur 


21.4.1 Causality and Invertibility 


ARMA (p,q) is causal (future-independent) <> 3{w,;}: gm w; < co such that 


ARMA (p,q) is invertible <=> A{mj}: 7725 7j < co such that 
1(B)az = SG = Wt 
j=0 


Properties 


e ARMA (p, g) causal <=> roots of ¢(z) lie outside the unit circle 


O(z) 
o(2) 


jz) <1 


b(z) = So bz = 
j=0 
e ARMA (p,q) invertible <=> roots of 0(z) lie outside the unit circle 


T(z) = SGe = oa \z| <1 
j=0 


Behavior of the ACF and PACF for causal and invertible ARMA models 


AR (p) MA (q) ARMA (p, @) 
ACF tails off cuts off after lag q¢ tails off 
PACF | cuts off after lag p tails off q tails off 


21.5 Spectral Analysis 


Periodic process 


x, = Acos(27wt + d) 
= U, cos(2mwt) + U2 sin(27wt) 


e Frequency index w (cycles per unit time), period 1/w 4 


e Amplitude A 
e Phase ¢ 


e U, = Acos¢ and U2 = Asin¢ often normally distributed Rv’s 


Periodic mixture 


(U1 cos(2rwpt) + Upg sin(2rwpzt)) 


< 
I 
Me 


k=1 


© Uri, Ug2, for k = 1,...,¢, are independent zero-mean RV’s with variances 07 


© y(h) = Oh, of cos(2tw yh) 
© (0) =E [x¢] = han % 
Spectral representation of a periodic process 


y(h) = 0? cos(2rwoh) 


2 . 
= we e 2Tiwoh 4 ae 
1/2 
=[ e2Tiwh dF( (w) 
1/2 
Spectral distribution function 
Ww < —W9 
“ -—w <w <u 
W > Wo 
e F(—oo) = F(-1/2) =0 
e F(oo) = F(1/2) = (0) 
Spectral density 
. —2riwh 1 1 
f(w) = S> rhe Sig es 
h=—oo 
e Needs 73°. |y(h)| < 00 => (h) = jo e27ioh Fy) dw h=0,+ 
e fiw) >0 
e fw) = f(-w) 


© fw) = fd-w) 
© (0) = V [xe] = 449 f(w) dw 

e White noise: f,,(w) = 02, 
ARMA (p,q) ,6(B)ae = 0(B)wr: 


52 Ole sain) te 
Fol) = 0 (e—Briayp 


where $(z) =1— S7P_, oxz* and 0(z) =1+ ea 9 


Discrete Fourier Transform (DFT) 
m . 
d(w;) = ni/2 Ls rye omit 
i=1 
Fourier/Fundamental frequencies 
wy = j/n 


Inverse DFT 
= =n lV? y d(w jee 


Periodogram 
I(j/n) = |d(j/n)|? 


Scaled Periodogram 


P(y/n) = <1(j/n) 


2 
2< 2 

=|[- y x, cos(Qntj/n | + { — y xy sin(Qrtj/n 
a age 


22 Math 


22.1 Gamma Function 


e Ordinary: I'(s) = ts-te-tdt 
0 
e Upper incomplete: I'(s, x) =| (oho dt 


e Lower incomplete: y(s, x) =} ts-tetdt 
0 
e T(a+1) =al(a) a>l 


2 


e T(n) = (n—- 1)! neN 
e P(1/2) = Vr 
22.2 Beta Function 
: D(x)P(y) 

e Ordinary: B(z,y) = Bly, z ayy M3101 -t) d= SO 

y: Biz, y) = Biy, 2) , (1—¢) heaey) 
e Incomplete: B(x; a, b) =| (1. =4)0 1 dt 

0 

e Regularized incomplete: 

_ B(a; a,b) a,ben ae (a+b-1)! F a+b—1—j 

so aaa 1 y Me+s-1-pr 
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Stirling numbers, 2”¢ kind 


EtaA  btde af eee. 


e Ip(a,b) =0 Iy(a,6) =1 
I,,(a, 6) = 1 — Lh_2(b, a) 
22.3 Series 
Finite Binomial 
k=1 k=0 
= “(rt+k r+n+1 
eS )(2k-1) =n? e ( )=( ) 
k=1 k=0 k n 
Pe me ait ee ere 
k=1 6 k=0 m - m + 1 
ft n(n +1) 4 e VANDERMONDE’s Identity: 
oyoe=() ss m n —_ (mtn 
re k}\r—k] r 
n conti | k=0 
® c= era er Al e Binomial Theorem: 
k=0 S- (1) a®—* pF = (a +b)” 
k=0 
Infinite 
o> =: = Ip] <1 
k=0 a = 
= d (— of A 
° kp*-1 = — p*) = ( ) pl<1 
ang (Ee) -$(ca)-re 
= k—-1 
ae }e*=a-a)" reNt 
k=0 
7 (je = +0)" Ipl<1,aec 
k=0 
22.4 Combinatorics 
Sampling 
k out of n w/o replacement w/ replacement 
k—1 
ordered k Il ») pal 2 f 
rdere n= = n—-i)= n 
oles (n—k)! 
ee n nt n) n—-l1+r n—-1+r 
unorder' = = - 
k kl! k(n —k)! r n—1 


Partitions 
Pees Pe k>n: Pap=0 n>1: Pro =0, Po =1 
i=1 
Balls and Urns f:BnoU D = distinguishable, =D = indistinguishable. 
|B) =n, |U| =m | f arbitrary f injective f surjective | f bijective 
B:D,U:AD m” Pe cone mio” Hee ena oe 
0 else m 0 else 
Reap cn n+n-—1 m n—1 1 m=n 
n n m—-1 0 else 
apron | Sit | faze] fm | fi men 
a lk 0 else m 0 else 
i 1 m>n 1 m=n 
B:AD, U:=AD bees Prim 
aa, 0 else 0 else 
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